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ABSTRACT 

A  set  of  96  samples  composed  of  6  subsets  was  evaluated  by  photographic 
and  direct  observation.  Using  a  set  of  subjective  criteria  for  excellence, 
the  samples  were  rated  by  three  different  evaluators  in  both  ordinal  and 
categorical  scales.  The  various  data  sets  resulting  were  subjected  to 
analysis  by  Krashal-Wallis  and  STP  procedures  (for  the  ordinal  data)  and 
by  Ridit  analysis  (for  the  categorical  data).  It  was  found  that  the 
ordinal  data  produced  consistent  results  both  by  photographic  and  direct 
evaluation.  The  Ridit  analysis  creates  an  informative  index  of  excellence 
which  allows  a  quality  level  value  to  be  given  to  a  irregular  distribu¬ 
tion.  It  is  a  practical  method  for  producing  data  conducive  to  graphic 
display.  Certain  pitfalls  in  Ridit  analysis  make  more  complex  analysis 
problematic. 


In  the  areas  of  clinical  or  applied  research  where  the  data  gathered 


is  of  a  categorical  or  ordinal  nature,  there  exist  major  barriers  to 
successful  analysis.  A  significant  obstacle  is  the  inability  to  esta¬ 
blish  fast  criteria  for  evaluation  which  are  resistant  to  operator 
imprecision,  inter-operator  differences,  or  imprecision  in  repeated  evalu¬ 
ations. 

Much  dental  research  can  be  best  evaluated  by  a  good-better-best 
rating  scheme  which  falls  between  strictly  categorical  data  types  (e.g. , 
red,  green,  blue)  and  ranked  or  ordinal  data.  It  is  a  constant  source 
of  difficulty  to  investigators  to  develop  simple  criteria  which  may  be 
easily  and  universally  applied  to  situations  of  clinical  evaluation. 

These  clinical  judgments  are  often  the  subjective  balancing  of  many 
subtle,  and  oft tines  undefined,  variables. 

Goldman  et  al.^  refers  to  the  low  level  of  agreement  amongst 
examiners  (<50'j)  in  evaluating  the  subjective  success  o  failure  of 
endodontic  treatment  diagnosed  using  radiographs.  These  same  evaluators 
agreed  with  their  own  subsequent  evaluations  at  a  much  higher  rate  (72-88  ). 
The  lack  of  sharply  bounded  standards  for  various  levels  of  success 
complicate  evaluation  of  clinical  studies  in  all  aspects  of  dental 
research. 

Assuming  that  the  "soft"  criteria  presently  used  for  clinical 
researchers  are  the  best  and  most  appropriate  available,  are  there 
certain  schemes  of  data  evaluation  and  analysis  which  are  uncomplicated 
in  application,  presentation,  and  understanding,  yet  are  resistant  to 
the  destructive  variables  previously  mentioned" 
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The  intent  of  this  study  was  to  examine  t he-  ability  of  using  "soft" 
evaluative  criteria,  both  categorical  and  rank-ordering  rating  schemes, 
and  different  statistical  analysis  to  separate  experimental  treatment 
groups.  At  issue  was  the  resistance  of  the  methods  to  variations  in 
evaluator  and  time;  i.  e.  ,  would  the  methods  yield  the-  same  scores  for 
the  individual  categories  regardless  of  who  did  the  evaluating,  or  when  or 
how  it  was  done? 

METHOD 

In  a  study  involving  the  ability  of  various  endodontic  filling 
techniques  to  reproduce  the  surface  of  a  root  canal  ,*■  96  specimens 
produced  by  6  endodontic  filling  techniques  were  evaluated  by  the  3 
investigators.  At  the  first  evaluation  session  using  a  ?5X  binocular 
microscope,  eacn  investigator  categorized  the  surface  of  the  filling 
into  I  of  4  types,  poor  to  excellent,  using  the  simple  criteria  given 
in  Table  1.  At  this  time,  a  4  x  5  Polaroid  photo  was  made  of  the 
surface. 

During  the  next  week,  each  separate  evaluator  placed  the  photos 
by  rank  from  1-96  with  #1  being  the  photo  considered  having  the  best 
replication  of  surface  detail,  and  ^96  the  photo  showing  the  worse 
replication  of  surface  detail. 

One  month  later,  the  photos  were  evaluated  and  <  jtegorized  ir,t;> 
one  of  the  four  types  as  done  with  the  microscope  orioinilly  At  a 
subsequent  session,  the  96  photos  were  again  ranked  from  1-96.  At 
each  of  the  evaluation  sessions,  the  technique  group  to  which  the 
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filling  belonged  was  unknown  to  the  evaluators  and  the  categorizations 
and  ranking  were  decided  independently  by  each  evaluator.  This  resulted 
in  two  sets  of  data,  ordinal  and  categorical. 

The  ordinal  data  was  evaluated  using  the  Kruskal-Wall is  Analysis 

3 

of  Variance  by  Ranks  and  a  Simultaneous  Test  Procedure  based  on  the 
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Mann-Whitney  U  Test.  The  categorical  data  {Tables  3a, b),  based  on  the 
classification  of  the  observations  into  the  tour  categories,  was  analyzed 

5 

by  Ridit  analysis. 

Ridit,  or  "reference  to  an  identified  distribution "  (a  specific,  data 
set,  in  this  case  the  entire  96  observations  by  ea:h  operator)  is  a 
statistical  technique  that  allows  any  type  of  distribution  to  be  compared 
to  a  reference  distribution  much  in  the  manner  of  X  goodness  of  fit 
tests.  It  does  allow  the  use  of  natural  ordering  present  in  the  data 
which  is  otherwise  lost.  In  data  sets  where  the  observations  can  be 
divided  into  categories  which  are  sequentially  related,  but  in  which 
further  definition  as  to  absolute  rank  is  not  possible,  or  of  spurious 
value,  the  use  of  the  traditional  ordinal  tests  is  complicated  by  the 
necessity  of  correcting  for  multiple  ties.  These  tests  may  yield  more 
approximate  results,  are  difficult  to  represent  graphically,  and  yield 
no  ample  index  of  the  value  (sic)  of  a  sample  in  which  the  variability 
of  the  sample  is  shown  (as  in  the  mean  and  standard  deviation  of  interval 
measurements) . 

A  distinct  advantage  in  using  Ridit  analysis  is  that  the  relative 
value  of  different  treatments  can  be  estimated  by  their  relative  average 
ridit  value.  This  will  be  further  demonstrated  in  the  section  which 
explains  the  calculation  of  Ridit  values. 


The  technique  lor  i.onv-'rsion  of  dut  j  to  Kiciil 
values  >  s  illustrated  below  using  the-  distribution  cf 
Evaluator  I  of  the  results  of  tire  direct  vision  evalua¬ 
tion. 

The  total  population  (96  observations)  forms 
a  frequency  distribution  which  totals  to  100'. 

The  technique  for  determining  the  value  of  each 
category  is  illustrated  below  using  the  distribution 


of  Operator 

1  using 

the  original 

d i rect 

vision  eva ! Uu- 

t  ion . 

Category  §  In  Each 
Group 

1 

%  In  Each 
Group 
? 

Mi dpo int 
of  Each 
Group 

3 

I'erccn  tile  of 
Mi  dpo  int.  of 
Each  Group 

4 

IV 

38 

39.58 

19.  79 

80.21 

1  1  1 

28 

29.  1  7 

14.59 

45.2  i 

1  1 

23 

23-96 

11.98 

19.27 

1 

7 

7.29 

365 

5.65 

Column  1  is  the  frequency  of  each  '.II.  Column  2 
is  the  relative  frequency  expressed  as  a  percent.  Column 
4  is  the  so  called  "Ridit's"  value  of  each  category 
which  is  the  sum  of  all  the  lower  group  percentage 
plus  half  the  cell  percentage. 

It  is  these  values  in  Column  4  which  are  used  to 
calculate  the  average  Ridit  value  for  an  experimental 


di s t  r i but  ion . 


For  example,  using  t  lit  distribution  assign,  d  by 


Evaluator  I  to  Technique  A  by  direct  vision: 

Ridit  Value  Previously 
Cal  ciil  at  ed  F  rom 

Reference  Distr.  Total 


1 V  E  xce 1 1 en  t 

7  X 

.802  1 

5.6147 

1  1 1  Good 

5 

.4583 

2.2315 

1  1  Accept  ab 1 e 

4 

-1927 

.  7708 

1  Poor 

0 

.0365 

0.0 

divided  by  ■■  of  tbllT.  6/700 

observat ions 

average  Ridit  for  .642313 

Technique  A  for 
Evaluator  I 

Thus ,  the  average  Ridit  value  of  Technique  A,  as 
evaluate,;  by  Operator  1  by  direct  vision,  is  .647313 

The  average  Ridit  of  any  reference  distribution  is  always  .5030- 
For  example: 


Reference  Distribution 

Ridit  Value 

IV 

38 

X 

.802  1 

30.4798 

1  1  1 

28 

X 

.4583 

12.8352 

1  1 

23 

X 

.1927 

4.4321 

1 

7 

X 

.0365 

_  .2555 

total  96 

divided  by 

96)47.8915 

.4989 

(The  value  of  .4989  differs  from  .5000  only  due  to  round-off  error) 
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The  value  of  .5423  can  be  interpreted  as  meaning  that  the  average  value 
of  specimens  produced  by  Technique  A  is  extremely  close  to  the  average  specimens 
from  the  reference  distribution. 

It  is  possible  to  compare  between  experimental  groups  by  comparing 

their  Ridit  value. 

For  example: 

Technique  A  has  an  average  Ridit  value  of  .5^23  and  Technique  C  has  a 
value  of  .1415.  The  probability  of  A  producing  a  better  (sic)  specimen  is 
.5423-. 1419  -  .4004  +.5  or  .9004.  (If  the  Ridits  were  equal,  tiie  chance  ot 
either  producing  a  better  specimen  than  the  other  would  be  .5;  i.e.,  .65.3 
-  .6523  =  0  +  .5  =  5). 

The  Ridits  for  different  categories  can  then  be  compared  using  t- 

tests J  or,  as  one  investigator  has  done,  by  using  parametric  ANOVA  techni- 
6 

ques. 

RESULTS 

Ordinal  Testing: 

The  rank  ordering  of  the  96  photos  by  the  3  evaluators  at  2  different 
times  yielded  6  data  sets  that  could  be  compared.  A  Kendall  coefficient 
of  concordance  (Table  3)  amongst  the  sets  indicated  that,  all  the  ranking 
sets  were  similar. 

Other  correlations  done  between  evaluator-',  and  within  operators  between 
times  of  evaluations  were  also  strongly  significant  (Table  3).  Based  on  the 
agreement  amongst  evaluators  at  one  time  period,  the  3  sets  of  ranks  for  each 
time  period  were  averaged  to  yield  an  average  rank  for  each  observation.  This 
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data  set  analyzed  with  ordinal  tests  to  attempt  to  determine  differences 
in  the  techniques.  Mann- Whitney  U  tests  were  done  to  compare  all  parts 
of  treatment  groups  and  a  simultaneous  test  procedure  was  done  based  on 
the  results  (Table  4). 

The  mean  ranks  are  shown  connected  by  under! ininqs  where  they  cannot 
be  shown  to  be  different.  Groups  d,  e,  t  ,  a  were  different  from  Group  f 
which  was  different  from  Group  c  at  the  .05  level  of  probability. 

Categori cal  Testing: 

Table  5  lists  the  means  and  standard  deviations  of  toe  Ridit  scores 
for  both  direct  and  photographic  evaluation. 

This  same  data  is  displayed  in  figures  1,  ?.  and  3  to  illustrate  the 
graphic  possibilities  of  this  type  of  data  summary. 

The  results  of  the  3-way  ANOVA  of  the  tidits  are  shown  in  Table  6. 

The  3-way  interaction  was  significant,  indicating  that  ttre  Ridit  score,  fa 
the  various  cells  were  affected  differently  by  the  various  con, binations  of 
mode  of  evaluation  and  operator.  This  interaction  precludes  testing  the 
main  effects.  The  data  was  separated  by  operator  and  method  of 
evaluation,  and  reanalyzed  using  1 -way  analyses  of  variance.  The  post  hoc 
tests  based  on  these  ANOVAS  are  presented  in  Table  7. 

An  arcsine  transformation  was  done  on  the  data;  an  accepted  technic),' 
to  normalize  percentage  or  proportional  data.  The  3-wav  ANOVA  results  wo- 
similar;  the  3-way  interaction  was  si  on  1 » ic.mt  (labia  ■') . 

Paired-t  comparisons  were  made  between  the  Rtdil  -uiues  for  the 
techniques  for  each  operator  according  to  the  mode  of  evaluation.  There  w 
no  significant  difference  for  any  operator  that  could  be  attributed  to  the 


manner  of  evaluation,  direct  or  photographic. 

DISCUSSION 

The  comparison  of  the  same  data  base  by  two  different  statistical 
techniques  provided  some  valuable  insights  about,  sever. i  i  problem',  that 
face  clinical  evaluators.  1)  Can  categorical  ranking  scales  be  compared 
between  operators?  2)  Are  ordinal  rankings  ..tore  wo.rthwn  ilc  and  powerful 
in  separating  sample  groups  in  clinic.:1  studies?  3)  Is  tiie  Ridit 
statistic,  which  is  intuitively  attractive  and  easily  undoes toduhle. 
as  useful  as  the  less  intuitive  ordinal  test  methods? 

The  ranked  data,  6  sets  of  observations  from  1-96,  was  evaluated 
first  by  measures  of  correlation  (Table  7)  which  indicated  that  each 
operator  at  each  time  ranked  the  samples  in  essentially  the  same  order. 
There  was  a  statistically  significant  agreement  within  all  possible  pairs 

The  degree  of  agreement  amongst  evaluators  ranged  from  .5/  to  .85 
and  by  evaluators  with  themselves  from  .7/  to  .91,  agreeing  fairly  ,.ell 
with  Goldman's  figures.1 

When  the  pooled  ranks  at  each  time  period  were  analyzed  using  the 
post-hoc  test  (Table  4),  the  analysis  at  the  two-time  per 'ads  yielded 
the  same  mean  rank  orders  for  the  treatment  groups. 

The  inf 1  uence  of  time  or  operator  seems  to  be  negligible  using  photo 
ranking  techniques.  This  is  in  agreement  with  other  conclusions  on  ♦ n i s 

9 

technique. 

The  Ridit  valuer,  were  analyzed  in  a  3-way  design  -  (technique  <  'per. 
tor  x  mode  of  evaluation).  The  avtable  indicated  significant  interaction 
(Tables  7  and  9).  This  conclusion  is  not.  intuitively  acceptable  when 
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the  Figjres  1-3  are  compared.  The-'-  results  mu'/  be  ascribed  to  character¬ 
istics  peculiar  to  Ridit  scores. 

In  our  data  base,  the  homogeneity  of  results  within  each  techniqje 
group  resulted  in  some  techniques  having  all  observations  in  the  same 
rank  category.  This  yields  a  cell  mean  for  analysis  of  variance  with  a 
zero  variance.  This  seems  to  have  unfavorable  effect  on  the  analysis 
of  variance  by  causing  small  differences  in  actual  operator/mode  evalua¬ 
tion  levels  to  be  unnecessarily  prominent  statistically. 

In  effect,  the  small  number  of  categories  implies  a  precision  of 
evaluation  that  does  not  exist. 

The  pitfall  of  Ridit  analysis  seems  to  be  that  it  implies  an  accuracy 
of  evaluation  that  is  not  really  present.  Since  the  Ridit  value  for  any 
category  can  be  extended  to  as  many  significant  digits  as  is  convenient, 
the  illusion  of  precision  may  he  increased  at  will.  In  a  practical  sense, 
if  Evaluator  2  were  to  assign  all  of  one  set.  of  specimens  to  Category  4. 
the  average  Ridit  would  be  .8021  with  a  standard  deviation  of  0.  If,  on 
subsequent  evaluation,  he  evaluated  the  same  group  and  assigned  13  to 
Category  IV,  and  3  to  Category  III,  the  average  Ridit  for  the  group  would 
be  .7376  with  a  s.d.  of  .13859.  This  difference  (t  =  1.86)  is  statistically 
significant  at  between  .05  and  .10,  <uid  intuitively  there  seems  to  be  a 
difference  between  the  numbers  .8021  and  .7376,  yet  with  the  loose  criteria, 
it  is  very  easy  to  allow  certain  borderline  cases  to  fit  into  either  one 
of  two  categories.  Thus  a  single  observation  could  be  given  tc  a  value  of 
either  .8021  or  .7376;  this  will  cause  an  effect  in  the  mean  and  standard 
deviation  out  of  proportion  to  the  deviation  of  opinion  which  caused  it. 
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The  precision  of  accuracy,  which  is  implied  by  the  four  significant.  figures 
after  the  decimal  point,  should  be  recognized  as  being  only  ,,  construct 
unrelated  to  the  precision  of  the  evaluation  technique. 

Because  of  the  significant  interaction  in  the  main  ANUVA,  the  data 
was  reevaluated  to  judge  the  degree  of  separation  of  the  techniques  when 
the  operators  and  mode  of  evaluation  were  all  considered  separately. 

Individual  analysis  of  the  Ridit  scores  for  each  mode/eval uator 
combination  showed  that  distinctions  drawn  between  the  treatment  groups 
are  essential  the  same  for  each  operator  and  mode.  Ridit  scores  for  the 
different  techniques  can  be  seen  to  be  approximate  the  same  between 
operators.  This  is  because  the  Ridit  analysis  is  essentially  a  rank- 

C 

ordering  technique  related  to  Wilcoxon  rank  sum  test.  Minor  differences 
between  Ridit  scores  for  the  same  categories  are  due  to  minor  discrepancies 
in  applying  the  "sliding  scale"  of  criteria  by  the  evaluators. 

The  Ridit  analysis  done  by  direct,  vision  gave  the  J(ine  relative  Ridit 
scores  to  the  various  techniques  resulting  in  the  sane-  separations  as  the 
Ridit  evaluation  by  photo  (Table  7).  It  can  be  inferred  that  even  such 
simple  criteria  as  listed  in  Table  1  can  be  applied  evenly  both  in 
pictures  and  by  direct  vision  with  some  confidence  that  undue  error  is 
not  introducted. 

Thus,  although  the  Ridit  analysis  is  "self-correcting"  to  some  extent, 
the  appearance  of  precision  implied  by  the  Ridit  score  for  each  category 
is  actually  specious.  The  very  broadness  of  the  category  decreases  the 
discriminative  ability  of  the  Ridit  analysis,  and  the  implied  precision 
makes  complex  mathematical  treatment  (as  in  multiway  ANOVA)  problematical. 
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Ridit  analysis  seems  tu  be  best  suited  to  graphical  display  of  data, 
simple  inference,  and  as  an  intuitively  appealing  index. 

CONCLUSIONS: 

The  conclusions  that  were  drawn  from  this  comparison  are  as  follows: 

1)  Ridit  analysis  is  a  remarkably  attractive  tool  for  use  in  categorical- 
ordinal  situations.  It  is  partially  self-correcting  between  operators,  but 
the  overly  precise  "appearance",  e.g.  ,  80?1  ,  .0365,  etc.  of  the  Ridit  score 
implies  an  accuracy  of  measurement  which  does  not  exist.  2)  The  use  of 
Ridit  analysis  in  graphic  representation  of  results  seems  to  be  far  more 
trustworthy  then  extensive  analysis  using  parametric  tests.  3)  The 
Ridit  analysis  provides  a  manageable  way  for  characterizing  the  value 
of  non-normal  distributions  and  provide  an  useful  measure  of  "central 
tendency"  for  this  type  of  data.  The  traditional  ordinal  tests  provide 
as  much,  or  more,  information  about  the  relative  worth  of  different 
samples  but,  in  this  case,  required  a  great  expenditure  in  time  and 
resources  to  provide  the  pictures  for  ranking.  4)  The  Ridit  character¬ 
izations  were  done  with  equal  certainty  and  with  equally  sound  results 
by  direct  vision  without  the  need  for  reproduction.  5)  Classification 
of  root  canal  fillings  into  categories  based  on  subjective  assessment 
by  direct  vision  through  a  microscope  was  shown  to  provide  the  same 
relative  results  as  evaluation  of  pictures  of  the  same  field.  Differ¬ 
ences  between  operators,  time  evaluated,  and  method  of  resuit  analysis 
produced  statistically  significant,  but  small  differences  that  did  not 
affect  the  relative  rankings  of  the  various  root  canal  fillings  technique 
employed. 
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Table  1 


Criteria  of  Nominal  Classification 


Class 

I  : 

(Poor) 

No  Apical  Replication 

Wrinkles  &  Folds 

Vo  Fins 

Class 

11  : 

(Acceptable) 

Poor  Apical  Replication 

Some 

Wrinkles  k  Folds 

Minimal 

Fins 

Class 

III: 

(Good) 

Good  Apical  Replication 

Few 

Wrinkles  &  Folds 

Fins 

Class 

IV  : 

(Excel  lent) 

Excellent  Apical  Replication 

No 

Wrinkles  &  Folds 

Fins 

Table  2a 


Example  DISTRIBUTION  OF  ORDINAL/CATEGORICAL  DATA 


DIRECT 

VISION 

EVALUATION 

Operator  I 

I 

II 

III 

IV 

Technique 

poor 

Fair 

Good  Excellent 

Total 

a 

4 

5 

7 

16 

b 

2 

3 

11 

16 

c 

7 

9 

16 

d 

7 

9 

16 

e 

1 

5 

10 

16 

f 

7 

8 

1 

16 

TOTAL 

7 

23 

23 

38 

96 

Operator  II 

a 

1 

0 

9 

6 

16 

b 

1 

2 

13 

16 

c 

12 

4 

16 

d 

5 

11 

16 

e 

3 

13 

16 

f 

10 

6 

16 

TOTAL 

13 

15 

25 

43 

96 

Operator  III 

a 

1 

10 

5 

16 

b 

1 

3 

12 

16 

c 

11 

5 

16 

d 

4 

12 

16 

e 

2 

14 

16 

f 

10 

6 

16 

TOTAL 

12 

16 

25 

43 

96 

Table  2b 


Photographic  Evaluation 


Operator  I 

I 

II 

III 

IV 

Technique 

Poor 

Fair 

Good 

txce i lent 

a 

1 

3 

4 

8 

b 

1 

2 

5 

8 

c 

11 

6 

<J 

1 

15 

e 

2 

8 

6 

f 

7 

8 

1 

i 

Operator  II 

a 

1 

5 

3 

1 

b 

1 

2 

7 

6 

c 

14 

2 

d 

1 

15 

e 

2 

2 

11 

1 

f 

2 

8 

4 

2 

Operator  III 

a 

3 

7 

6 

b 

16 

c 

16 

d 

6 

10 

e 

16 

f 

6 

10 

Table  3 


Analysis  of  Ordinal  Data 


Kendall  Coefficient  of  Concordance 


degree  of  agreement  amongst  all  3  operators  of 

ranked  1-96 

initial  evaluation  .7746  p 

second  evaluation  .8427  p 


Spearman  Rho 

degree  of  agreement  between  evaluators 
initial  evaluation 


Evaluators  Rho  value  t  value 


1  vs 

2 

.8496 

11.4 

P  ‘ 

.000 

1  vs 

3 

.4869 

6.88 

P  < 

.000 

2  vs 

3 

.5703 

7.32 

P  • 

.000 

second  evaluation 

1-2 

.6916 

8.32 

P  •' 

.000 

1-3 

.86667 

12.05 

P  < 

.000 

2-3 

.7332 

8.84 

P  " 

.000 

degree 

of 

agreement  within 

operator  between  in 

and  second 

evaluation 

1 

.6730 

8.13 

P  -■ 

.000 

2 

.9119 

14.65 

P  - 

.000 

3 

.7721 

9.46 

P  ■ 

.000 

ctures 

.0000 

.0000 


Table  4 

Separation  of  *>C' sues  by  STP  Based  on 
Mann-Whitrwy  Using  Ordinal  Data 

Initial  evaluation 

technique  debate 

mean  pooled  rank  64.4  110.3  126.0  127.5  202.4  241.2 


P'  .05 


Second  evaluation 

technique  d  b  e  a  f  c 

mean  pooled  rank  73.4  105.6  112.8  132.0  194.3  262.6 


p-^.05 


I  X 


T.-.ble  5 
k i d i t  Scores 


EVALUATOR 


1 


Di rect 

Picture 

Technique  A 

X 

s 

.5423 

.2578 

.5938 

.2578 

B 

X 

.6615 

.6136 

s 

.2282 

.2401 

r 

X 

.1419 

.1146 

s 

.0593 

.0746 

0 

X 

.6417 

.7878 

s 

.1775 

.0573 

r 

X 

.6351 

.6172 

s 

.2056 

.1837 

r 

X 

.3636 

.4414 

s 

.1767 

.1921 

EVALUATOR  EVALUATOR 

_ 2 _ __ _ 3_ _ 


Di rect 

Picture 

Direct 

Picture 

.5326 

.5801 

.5101 

.4753 

.2136 

.2698 

.2052 

.2471 

.6966 

.6078 

.6731 

.7400 

.1763 

.2302 

.1884 

.0000 

.1042 

.1302 

.1081 

.099 

.0652 

.0711 

.0648 

.0000 

.6653 

.8308 

.6875 

.6191 

.1695 

.0729 

.1584 

.1745 

.7096 

.4883 

.7317 

.7500 

.1427 

.1896 

.1209 

.0000 

.2917 

.4147 

.  2884 

.3066 

.1042 

.2214 

.  10b8 

.0755 

Table  5 


3- Way  Analysis 

of  Variance  for 
(raw  data) 

Ridit 

Scores 

Main  Effects 

Sum  of  Squares 

DF 

MS 

F 

P 

technique 

23.372 

5 

4.674 

155.034 

.000 

evaluation 

.004 

2 

.002 

.061 

.941 

mode  of  evaluation 

.000 

1 

.000 

.000 

.989 

2 -way  interaction 

technique  x  evaluation 

.  924 

10 

.092 

3.066 

.001 

technique  x  mode 

.864 

5 

.173 

5.731 

.005 

evaluation  x  mode 

.004 

2 

.002 

.058 

.543 

3-way  interaction 

technique  x  evaluation  x  mode 

1.029 

10 

.103 

3.414 

.000 

residual 


16.281 


540 


.030 


Table  7 

Separation  Produced  by  Individual  ANOVA 
+Student  Neuman  Keuls  Test  (at  .05  Confidence  Level) 

Direct  Vision 

Evaluator  Technique  F  Value  of  ANOVA  P 

1  dbeafc  6.517  .0000 

2  ebdaf  c  3.540  .0037 

3  e  d  b  a  f  c  3.681  .0027 

Pictures 

1  dbac  f  c  A  904  .0134 

2  dbeafc  7.217  .0000 

e  b  d  f  a  c 


3 


3.028 


.0104 


Table  8 


3-Way  Analysis  of  Variance 
(data  transformed  with  arcsine  junction) 


Main  Effects 

Sum  of  Squares 

OF 

MS 

F 

P 

technique 

31.671 

5 

6.334 

140 

.0000 

evaluation 

.00414 

2 

.002 

.05 

.9553 

mode  of  evaluation 

. 00004 

1 

.0004 

.00 

.9769 

2-way  interaction 

technique  x  evaluation 

1.44449 

10 

.14445 

3.19 

.0005 

technique  x  mode 

1.30621 

E 

.27724 

6.13 

.0000 

evaluation  x  mode 

.00412 

2 

.00206 

.05 

.9554 

3-way  interaction 

technique  x  evaluation  x  mode 

1.60539 

10 

.16054 

3. 55 

.0001 

residual 


24.43137 


540 


.04522 


Legend 

Figures  1,  2,  and  3 

Average  Ridit  scores  and  95  confidence  intervals  illustrate  ease 
of  graphic  interpretation  of  a  Ridit  score. 
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